ANALYSIS OF VARIANCE

Biostatistics Support and Research Unit

Germans Trias i Pujol Research Institute and Hospital (IGTP)
Badalona

March 27, 2025

Summary

1. Analysis of variance

2. One-Way Analysis of variance

3. Two-Way Analysis of variance

4. Repeated Mesures Analysis of variance

5. Final messages

Compare two independent groups

\[ H_0: \mu_1 = \mu_2 \]

\[ H_1: \mu_1 \ne \mu_2 \]

Characteristic A
N = 301
B
N = 301
p-value2
Weight (Kg) 46.8 (11.7) 52.8 (12.0) 0.058
1 Mean (SD)
2 Welch Two Sample t-test

From 2 to 3 groups or more…

\[ H_0: \mu_1 = \mu_2 = \mu_3 \]

\[ H_1: \mu_1 \ne \mu_2 \ne \mu_3 \]

Characteristic A
N = 301
B
N = 301
C
N = 301
Weight (Kg) 46.7 (8.5) 52.2 (9.7) 41.2 (8.7)
1 Mean (SD)

ONE-WAY ANOVA

ANOVA


Analysis of Variance (ANOVA) is a statistical method used to compare the means of three or more groups to determine if there are significant differences between them.


Conditions:

  • Independence
  • Normality
  • Homogeneity of Variances

Variance

Variance is a measure of dispersion that quantifies how much a set of values deviates from their mean.

\[ \sigma^2=\frac{\sum_{i=1}^{N}{(x_i-\mu)^2}}{N} \]

\[ S^2=\frac{\sum_{i=1}^{n}{(x_i-\bar{x})^2}}{(n-1)} \]

The standard deviation is the square root of the variance.

Analysis of Variance


Analysis of variance (ANOVA) is based on the decomposition of total variability into two components:


Total Variance = Between-Group Variance + Within-Group Variance


Between-Group Variance: how much the group means deviate from the grand mean, weighted by the number of subjects in each group.


Within-Group Variance: how much individual observations deviate from their respective group means.

Analysis of Variance


Decomposition of total variability is expressed as:

\[\sum_{j=1}^{g}\sum_{i=1}^{n_j}(x_{ij}-\bar{X})^2 = \sum_{j=1}^{g}n_j(\bar{x_j}-\bar{X})^2 + \sum_{j=1}^{g}\sum_{i=1}^{n_j}(x_{ij}-\bar{x_j})^2\] where

  • \(g\) is the number of groups
  • \(n_j\) is the number of subjects in group \(j\)
  • \(x_{ij}\) is the value of the i-th subject in group \(j\)
  • \(\bar{x_j}\) is the mean of group \(j\)
  • \(\bar{X}\) is the GRAND mean

Let’s plot: Total Variance

Show how each observation deviates from then grand mean.

\[\sum_{j=1}^{3}\sum_{i=1}^{30}(x_{ij}-\bar{X})^2\]

Let’s plot: Between-Group Variance

Show the difference between each group’s mean and the overall mean.

\[\sum_{j=1}^{3}n_j(\bar{x_j}-\bar{X})^2\]

Let’s plot: Within-Group Variance

Show how each observation deviates from its group’s mean.

\[\sum_{j=1}^{3}\sum_{i=1}^{30}(x_{ij}-\bar{x_j})^2\]

Analysis of Variance


\[\begin{array}{|c|c|c|c|c|} \hline \textbf{Source} & \textbf{Sum of Squares (SS)} & \textbf{df} & \textbf{Mean Square (MS)} & \textbf{F-statistic} \\ \hline Between-Groups & SS_B = \sum_{j=1}^{g} n_j (\bar{x}_j - \bar{X})^2 & g - 1 & MS_B = \frac{SS_B}{g-1} & F = \frac{MS_B}{MS_W} \\ \hline Within-Groups (Error) & SS_W = \sum_{j=1}^{g} \sum_{i=1}^{n_j} (x_{ij} - \bar{x}_j)^2 & N - g & MS_W = \frac{SS_W}{N-g} & \\ \hline Total & SS_T = \sum_{i=1}^{N} (x_i - \bar{X})^2 & N - 1 & & \\ \hline \end{array}\]
  • \(g\) is the number of groups,
  • \(n_j\) is the number of observations in group \(j\),
  • \(N\) is the total number of observations,
  • \(\bar{x}_j\) is the mean of group \(j\),
  • \(\bar{X}\) is the GRAND mean.

The p-value is obtained from the F-distribution with \((g-1, N-g)\) degrees of freedom.

Fisher distribution

Let’s plot: No effect

Let’s tabulate: No effect

\(H_0: \mu_1 = \mu_2 = \mu_3\)
\(H_1: \mu_1 \ne \mu_2 \ne \mu_3\)



The probability of obtaining a \(F=\frac{MS_b}{MS_w}\) as extreme as the observed (or more extreme) under the assumption that the null hypothesis is true is 0.1123655. We can’t reject the null hypothesis.

ANOVA Table
Terms df SS MSQ F P-value
Between-groups 2 512.5434 256.2717 2.241856 0.1123655
Within-groups 87 9945.1673 114.3123 NA NA

Let’s plot: large effects

Let’s tabulate: large effects

\(H_0: \mu_1 = \mu_2 = \mu_3\)
\(H_1: \mu_1 \ne \mu_2 \ne \mu_3\)



The probability of obtaining a \(F=\frac{MS_b}{MS_w}\) as extreme as the observed (or more extreme) under the assumption that the null hypothesis is true is <0.0001. We can reject the null hypothesis.

ANOVA Table
Terms df SS MSQ F P-value
Between-groups 2 6005.04 3002.5200 22.14413 1.683683e-08
Within-groups 87 11796.32 135.5899 NA NA

No vs large effects

Large differences?


ANOVA only tells us that at least one group is different (without identifying which ones)

  • Which specific pairs of groups show differences?
  • Are these differences practically significant, or just statistically significant?

Post-hoc


A post-hoc test is a statistical procedure used after an ANOVA to determine which specific groups differ when the overall test finds a significant effect.


Post-hoc tests control for multiple comparisons to avoid inflating the Type I error rate.


  • Tukey HSD: equal variances and moderately conservative.
  • Bonferroni: equal variances and very conservative.
  • Dunnett: only vs control group, equal variances and moderately conservative.

Conditions: normality

  • The dependent variable (response) should be approximately normally distributed in each group.

  • ANOVA is robust to this assumption when groups have similar sample sizes

  • Can be checked using QQ-plots or the Shapiro-Wilk test.

Conditions: normality


Shapiro-Wilk test


  • \(H_0: \text{Data is Normally distributed}\)

  • \(H_1: \text{Data is not Normally distributed}\)


It’s a highly sensitive test to sample size.

  • For small samples may lack power to detect deviations.
  • For large samples may flag trivial departures as significant.

Conditions: Homogeneity of Variances

  • The variance of the response variable should be approximately equal across groups (Homoscedasticity).

  • Can be checked using different type of plots AND the Levene’s test or Bartlett’s test.

Conditions: Homogeneity of Variances


Levene’s test


  • \(H_0: \text{All groups have equal variances}\)

  • \(H_1: \text{At least one group has a variance different from the others}\)

Uses absolute deviations from the mean/median to assess data dispersion. Is robust to non-normality.

Conditions: Homogeneity of Variances


Bartlett test


  • \(H_0: \text{All groups have equal variances}\)

  • \(H_1: \text{At least one group has a variance different from the others}\)

Based on the likelihood-ratio test. It requires normality.

Hands-on ONE-WAY ANOVA

Guinea Pigs: data

The Effect of Vitamin C dose on Tooth Growth in Guinea Pigs

library(dplyr) # Data managment
library(gtsummary) # Summary of results
library(ggplot2) # Graphics

data(ToothGrowth) # Data

ToothGrowth <- ToothGrowth |>
    mutate(dose=factor(dose))  # dose as factor

ToothGrowth |> 
    dplyr::select(-supp) |>
    gtsummary::tbl_summary(
        by=dose,
        type = all_continuous() ~ "continuous2", 
        statistic=all_continuous()~c("{mean} ({sd})",
        "{median} ({p25}, {p75})"),
        digits = all_continuous() ~ 2,
        label = list(len ~ "Length (cm)"))  
            
# Plot
ggplot(ToothGrowth, aes(x = dose, y =len,color=dose)) +
  geom_point(size = 4)+
  geom_boxplot()+
  labs(x = "Dose", y = "Length (cm)") +
  theme_minimal() +
  theme(legend.position = "none")
Characteristic 0.5
N = 20
1
N = 20
2
N = 20
Length (cm)


    Mean (SD) 10.61 (4.50) 19.74 (4.42) 26.10 (3.77)
    Median (Q1, Q3) 9.85 (7.15, 13.00) 19.25 (16.00, 23.45) 25.95 (23.45, 28.35)

Guinea Pigs: normality

#QQplot
ggplot(ToothGrowth, aes(sample = len)) +
  geom_qq() +
  geom_qq_line() +
  labs(title = "Q-Q Plots for Each Group", x = "Theoretical Quantiles", y = "Sample Quantiles") +
  facet_wrap(~dose) +  
  theme_minimal() 

#Shapiro-Wilk
ToothGrowth |>
  group_by(dose) |>
  summarise(p_value = shapiro.test(len)$p.value)


# A tibble: 3 × 2
  dose  p_value
  <fct>   <dbl>
1 0.5     0.247
2 1       0.164
3 2       0.902

Guinea Pigs: homocedasticity

# Plot
ggplot(ToothGrowth, aes(x =dose, y = len, fill = dose)) +
  geom_violin(alpha = 0.5) +
  geom_jitter(width = 0.2, size = 2, alpha = 0.7) +
  theme_minimal() +
  labs(title = "Violin+Jittered: Variance Visualization",
       x = "Dose",
       y = "Length (cm)") +
  theme(legend.position = "none")

# Levene Test
library(car)  
leveneTest(len ~ dose, data = ToothGrowth)

# Bartlett Test
bartlett.test(len ~ dose, data = ToothGrowth)

Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  2  0.6457 0.5281
      57               

    Bartlett test of homogeneity of variances

data:  len by dose
Bartlett's K-squared = 0.66547, df = 2, p-value = 0.717

Guinea Pigs: anova

# ANOVA: dependent ~ independent
model_anova <- aov(len ~ dose, data = ToothGrowth)  

# Arrange results
results_anova <- broom::tidy(model_anova)
resultats_anova$term<-c("Between-groups","Within-groups")
names(resultats_anova)<-c("Terms","df","SS","MSQ","F","P-value")

# Tabulate results
results_anova |>
  gt::gt() |>
  gt::fmt_number(decimals = 3) |>
  gt::tab_header(title = "Guinea Pigs: 1-way Anova") 


Guinea Pigs: 1-way Anova
Terms df SS MSQ F P-value
Between-groups 2.000 2,426.434 1,213.217 67.416 0.000
Within-groups 57.000 1,025.775 17.996 NA NA

The probability of obtaining a \(F=\frac{MS_b}{MS_w}\) as extreme as the observed (or more extreme) under the assumption that the null hypothesis is true is <0.0001. We can reject the null hypothesis that all means are equal.

Guinea Pigs: Analysis of Variance

Guinea Pigs: Analysis of Variance

Guinea Pigs: post-hoc

library(stats)

# Tukey's Honest Significant Difference Test
TukeyHSD(model_anova)

# Bonferroni
pairwise.t.test(ToothGrowth$len, ToothGrowth$dose, p.adjust.method = "bonferroni")


  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = len ~ dose, data = ToothGrowth)

$dose
        diff       lwr       upr    p adj
1-0.5  9.130  5.901805 12.358195 0.00e+00
2-0.5 15.495 12.266805 18.723195 0.00e+00
2-1    6.365  3.136805  9.593195 4.25e-05

    Pairwise comparisons using t tests with pooled SD 

data:  ToothGrowth$len and ToothGrowth$dose 

  0.5     1      
1 2.0e-08 -      
2 4.4e-16 4.3e-05

P value adjustment method: bonferroni 

Guinea Pigs: post-hoc

# Dunnett's Test
library(multcomp)
summary(glht(model_anova, linfct = mcp(dose = "Dunnett"))) 



     Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Dunnett Contrasts


Fit: aov(formula = len ~ dose, data = ToothGrowth)

Linear Hypotheses:
             Estimate Std. Error t value Pr(>|t|)    
1 - 0.5 == 0    9.130      1.341   6.806 1.34e-08 ***
2 - 0.5 == 0   15.495      1.341  11.551  < 1e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)

Guinea Pigs: marginal means

library(emmeans) 

# Estimated marginal means
emm <- emmeans(model_anova, ~ dose)
print(emm)

# Pairwise comparisons with Tukey adjustment
pairs(emm, adjust = "tukey")
 dose emmean    SE df lower.CL upper.CL
 0.5    10.6 0.949 57     8.71     12.5
 1      19.7 0.949 57    17.84     21.6
 2      26.1 0.949 57    24.20     28.0

Confidence level used: 0.95 
 contrast        estimate   SE df t.ratio p.value
 dose0.5 - dose1    -9.13 1.34 57  -6.806  <.0001
 dose0.5 - dose2   -15.49 1.34 57 -11.551  <.0001
 dose1 - dose2      -6.37 1.34 57  -4.745  <.0001

P value adjustment: tukey method for comparing a family of 3 estimates 

Guinea Pigs: marginal means

# Plot
emmip(emm, ~ dose) +  
  theme_minimal() + 
  labs(title = "Estimated Marginal Means with CI", 
       y = "Estimated Mean Length", 
       x = "Dose")


# Plot estimated marginal means with confidence intervals
ggplot(as.data.frame(emm),  aes(x = dose, y = emmean, ymin = lower.CL, ymax = upper.CL)) +
  # Mean points
  geom_point(size = 4, color = "blue") +  
  # Confidence intervals
  geom_errorbar(width = 0.2, color = "blue") +  
  labs(title = "Marginal Means with 95% CIs",
      x = "Dose",
      y = "Estimated Tooth Length") +
  theme_minimal() 

Guinea Pigs: residuals

# Extract Standardized residuals and fitted values
ToothGrowth$resid <- rstandard(model_anova)  
ToothGrowth$fitted <- fitted(model_anova)

# Normality
ggplot(ToothGrowth, aes(sample = resid)) +
  geom_qq() +
  geom_qq_line() +
  labs(title = "Q-Q Plot of Residuals", 
       x = "Theoretical Quantiles", 
       y = "Sample Quantiles") +
  theme_minimal() 

# Residuals vs Fitted plot to check variance patterns
ggplot(ToothGrowth, aes(x = fitted, y = resid))+
  geom_point() +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  labs(title = "Residuals vs Fitted Values",
       x = "Fitted Values",
       y = "Residuals") +
  theme_minimal() 

Guinea Pigs: Heterogeneity

Welch’s correction

library(WRS2)
welch<-t1way(len ~ dose, data = ToothGrowth)  
print(welch)
Call:
t1way(formula = len ~ dose, data = ToothGrowth)

Test statistic: F = 60.026 
Degrees of freedom 1: 2 
Degrees of freedom 2: 21.53 
p-value: 0 

Explanatory measure of effect size: 0.88 
Bootstrap CI: [0.79; 1]

White’s correction

library(rstatix)
white<-anova_test(len ~ dose, data = ToothGrowth, white.adjust = TRUE)
print(white)
ANOVA Table (type II tests)

  Effect DFn DFd      F       p p<.05
1   dose   2  57 66.129 1.4e-15     *

TWO-WAY ANOVA

From 1 factor to 2 factors or more…

Main effect of Factor A:

\(H_0: \mu_{A1} = \mu_{A2} = \mu_{A3}\)
\(H_1: \mu_{A1} \ne \mu_{A2} \ne \mu_{A3}\)


Main effect of Factor B:

\(H_0: \mu_{B1} = \mu_{B2} = \mu_{B3}\)
\(H_1: \mu_{B1} \ne \mu_{B2} \ne \mu_{B3}\)


Interaction effect (A × B):

\(H_0: \text{NO interaction between A and B}\)
\(H_1: \text{Interaction between A and B}\)

TWO-WAY ANOVA


Two-way Analysis of Variance (ANOVA) is a statistical method used to compare the means across levels of two independent factors. It tests for significant differences in means due to each factor individually (main effects) and whether the two factors interact, meaning the effect of one factor depends on the level of the other (interaction effect).


Conditions:

  • Independence
  • Normality
  • Homogeneity of Variances

Analysis of Variance


Total Variance = Between-Group A Variance + Between-Group B Variance + Interaction-Group Variance + Within-Groups Variance


Between-Group A Variance: Measures how much the group A means deviate from the grand mean, weighted by the number of subjects in each group


Between-Group B Variance: Measures how much the group B means deviate from the grand mean, weighted by the number of subjects in each group


Interaction Variance: Measures how much the combined effect of Factors A and B deviates from what we would expect if the effects were purely additive.
\(SS_{AxB}=\sum_{i=1}^{a}\sum_{j=1}^{b}n_{ij}(\bar{x_{ij}}-\bar{x_{i}}-\bar{x_{j}}-\bar{X})\)


Within-Group Variance: Measures how much individual observations deviate from their respective group means.

Analysis of Variance


\[\begin{array}{|l|c|c|c|c|c|} \hline \textbf{Source} & \textbf{SS} & \textbf{df} & \textbf{MS} & \textbf{F} & \textbf{p-value} \\ \hline Factor A & SS_A & df_A & MS_A = \frac{SS_A}{df_A} & F_A = \frac{MS_A}{MS_{error}} & p_A \\ Factor B & SS_B & df_B & MS_B = \frac{SS_B}{df_B} & F_B = \frac{MS_B}{MS_{error}} & p_B \\ Interaction (A×B) & SS_{AxB} & df_{AxB} & MS_{AxB} = \frac{SS_{AB}}{df_{AxB}} & F_{AxB} = \frac{MS_{AxB}}{MS_{error}} & p_{AxB} \\ Error & SS_{error} & df_{error} & MS_{error} = \frac{SS_{error}}{df_{error}} & - & - \\ Total & SS_{total} & df_{total} & - & - & - \\ \hline \end{array}\]

Hands-on TWO-WAY ANOVA

Guinea Pigs: data reloaded

The Effect of Vitamin C dose and Supplement type on Tooth Growth in Guinea Pigs

library(dplyr) # Data managment
library(gtsummary) # Summary of results
library(ggplot2)# Graphics

data(ToothGrowth) # data

ToothGrowth <- ToothGrowth |>
    mutate(dose=factor(dose),  # dose as factor
           supp=factor(supp))  # supp as factor

ToothGrowth |>
  tbl_strata(
    strata = supp,  .tbl_fun =  ~ .x |>
        tbl_summary(by = dose, 
        missing = "no",
        type = all_continuous() ~ "continuous2", 
        statistic=all_continuous()~c("{mean} ({sd})",
        "{median} ({p25}, {p75})"),
        digits = all_continuous() ~ 2,
        label = list(len ~ "Length (cm)"))|>
    .header = "**{strata}**, N = {n}"
  )
Characteristic
OJ, N = 30
VC, N = 30
N 0.5
N = 10
1
N = 10
2
N = 10
N 0.5
N = 10
1
N = 10
2
N = 10
Length (cm) 30


30


    Mean (SD)
13.23 (4.46) 22.70 (3.91) 26.06 (2.66)
7.98 (2.75) 16.77 (2.52) 26.14 (4.80)
    Median (Q1, Q3)
12.25 (9.70, 16.50) 23.45 (20.00, 25.80) 25.95 (24.50, 27.30)
7.15 (5.80, 11.20) 16.50 (15.20, 17.30) 25.95 (23.30, 29.50)

Guinea Pigs: data reloaded

The Effect of Vitamin C dose and Supplement type on Tooth Growth in Guinea Pigs

# Plot
ggplot(ToothGrowth, aes(x = dose, y = len, color = supp)) +
  geom_point(size = 4, alpha = 0.3) +
  geom_boxplot(aes(x = dose, group = interaction(dose, supp)), width = 0.4, alpha = 0.5) +
  facet_wrap(~supp) +  # Separate plots for OJ and VC
  labs(x = "Dose", y = "Length (cm)") +  
  theme_minimal() 

Guinea Pigs: normality

#QQplot
ggplot(ToothGrowth, aes(sample = len)) +
  geom_qq() +
  geom_qq_line() +
  facet_wrap(~dose*supp) +  
  labs(title = "Q-Q Plots for Each Group", x = "Theoretical Quantiles", y = "Sample Quantiles") +
  theme_minimal() +

# A tibble: 6 × 3
# Groups:   dose [3]
  dose  supp  p_value
  <fct> <fct>   <dbl>
1 0.5   OJ      0.182
2 0.5   VC      0.170
3 1     OJ      0.415
4 1     VC      0.270
5 2     OJ      0.815
6 2     VC      0.919

Guinea Pigs: homocedasticity

# Plot
ggplot(ToothGrowth, aes(x =dose, y = len, fill = dose)) +
  geom_violin(alpha = 0.5) +
  geom_jitter(width = 0.2, size = 2, alpha = 0.7) +
  theme_minimal() +
  facet_wrap(~supp) +
  labs(title = "Violin+Jittered: Variance Visualization",
       x = "Dose",
       y = "Length (cm)") +
  theme(legend.position = "none")

# Levene Test
library(car)  
leveneTest(len ~ dose*supp, data = ToothGrowth)

# Bartlett Test
bartlett.test(len ~ interaction(dose, supp), data = ToothGrowth)

Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  5  1.7086 0.1484
      54               

    Bartlett test of homogeneity of variances

data:  len by interaction(dose, supp)
Bartlett's K-squared = 6.9273, df = 5, p-value = 0.2261

Guinea Pigs: anova

# ANOVA: dependent ~ independent
model_anova_2w <- aov(len ~ dose*supp, data = ToothGrowth)  

# Arrange results
results_anova_2w <- broom::tidy(model_anova_2w)
results_anova_2w$term<-c("Between-groups: Dose","Between-groups: Supp","Interaction","Within-groups")
names(results_anova_2w)<-c("Terms","df","SS","MSQ","F","P-value")

# Tabulate results
results_anova_2w |>
  gt::gt() |>
  gt::fmt_number(decimals = 3) |>
  gt::tab_header(title = "Guinea Pigs: 2-way Anova") 
Guinea Pigs: 2-way Anova
Terms df SS MSQ F P-value
Between-groups: Dose 2.000 2,426.434 1,213.217 92.000 0.000
Between-groups: Supp 1.000 205.350 205.350 15.572 0.000
Interaction 2.000 108.319 54.160 4.107 0.022
Within-groups 54.000 712.106 13.187 NA NA

We reject the null hypothesis that all means are equal across dose groups.
We also reject the null hypothesis that all means are equal across supplement groups.
Furthermore, we reject the null hypothesis for the interaction effect, indicating that the effect of dose depends on the supplement type.

Guinea Pigs: marginal means

# Estimated marginal means
emm <- emmeans(model_anova_2w, ~ dose*supp)
print(emm)

pairs(emm, adjust = "tukey")
 dose supp emmean   SE df lower.CL upper.CL
 0.5  OJ    13.23 1.15 54    10.93     15.5
 1    OJ    22.70 1.15 54    20.40     25.0
 2    OJ    26.06 1.15 54    23.76     28.4
 0.5  VC     7.98 1.15 54     5.68     10.3
 1    VC    16.77 1.15 54    14.47     19.1
 2    VC    26.14 1.15 54    23.84     28.4

Confidence level used: 0.95 
 contrast                estimate   SE df t.ratio p.value
 dose0.5 OJ - dose1 OJ      -9.47 1.62 54  -5.831  <.0001
 dose0.5 OJ - dose2 OJ     -12.83 1.62 54  -7.900  <.0001
 dose0.5 OJ - dose0.5 VC     5.25 1.62 54   3.233  0.0243
 dose0.5 OJ - dose1 VC      -3.54 1.62 54  -2.180  0.2640
 dose0.5 OJ - dose2 VC     -12.91 1.62 54  -7.949  <.0001
 dose1 OJ - dose2 OJ        -3.36 1.62 54  -2.069  0.3187
 dose1 OJ - dose0.5 VC      14.72 1.62 54   9.064  <.0001
 dose1 OJ - dose1 VC         5.93 1.62 54   3.651  0.0074
 dose1 OJ - dose2 VC        -3.44 1.62 54  -2.118  0.2936
 dose2 OJ - dose0.5 VC      18.08 1.62 54  11.133  <.0001
 dose2 OJ - dose1 VC         9.29 1.62 54   5.720  <.0001
 dose2 OJ - dose2 VC        -0.08 1.62 54  -0.049  1.0000
 dose0.5 VC - dose1 VC      -8.79 1.62 54  -5.413  <.0001
 dose0.5 VC - dose2 VC     -18.16 1.62 54 -11.182  <.0001
 dose1 VC - dose2 VC        -9.37 1.62 54  -5.770  <.0001

P value adjustment: tukey method for comparing a family of 6 estimates 

Guinea Pigs: marginal means

# Plot estimated marginal means with confidence intervals

ggplot(as.data.frame(emm), aes(x = dose, y = emmean, color = supp, group = supp)) +
  geom_point(size = 4) +
  geom_line() +
  geom_errorbar(aes(ymin = emmean - SE, ymax = emmean + SE), width = 0.2) +
  scale_x_discrete(labels = c("0.5", "1", "2")) +
  labs(x = "Dose", y = "Estimated Marginal Mean", color = "Supplement") 
  theme_minimal() 

Guinea Pigs: residuals

# Extract residuals and fitted values

ToothGrowth$resid <- rstudent(model_anova_2w)  
ToothGrowth$fitted <- fitted(model_anova_2w)

# Normality

ggplot(ToothGrowth, aes(sample = resid)) +
  geom_qq() +
  geom_qq_line() +
  labs(title = "Q-Q Plot of Residuals", 
       x = "Theoretical Quantiles", 
       y = "Sample Quantiles") +
  theme_minimal()

# Residuals vs Fitted plot to check variance patterns

ggplot(ToothGrowth, aes(x = fitted, y = resid)) +
  geom_point() +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  labs(title = "Residuals vs Fitted Values",
       x = "Fitted Values",
       y = "Residuals") +
  theme_minimal() 

REPEATED MEASURES ANOVA

From 1 measures to 2 measures or more…

In dependent samples, the same subjects are measured several times under different conditions.

Main effect of Time:

\(H_0: \mu_{T1} = \mu_{T2} = \mu_{T3}\)
\(H_1: \mu_{T1} \ne \mu_{T2} \ne \mu_{T3}\)


Main effect of Group:

\(H_0: \mu_{Control} = \mu_{Treatment}\)
\(H_1: \mu_{Control} \ne \mu_{Treatment}\)


Interaction effect (Time × Group):

\(H_0: \text{NO interaction between Time and Group}\)
\(H_1: \text{Interaction between Time and Group}\)

Analysis of variance


Repeated Measures Analysis of Variance (ANOVA) is a statistical method used to compare means across levels of an independent factor (e.g., study group) while accounting for within-subject dependencies over time.


Conditions:

  • Normality
  • Sphericity: homogeneity of Variances

Analysis of Variance


Total Variance = Between-Groups Variance + Between-Time Variance + Between-Subjects Variance + Within-Subjects Variance


Between-Group Variance (Group): Measures how much the group means deviate from the grand mean. This is the effect of the group factor.


Between-Time Variance (Time): Measures how much the means at different time points deviate from the grand mean. This captures how much the dependent variable changes over time.


Between-Subjects Variance (Subjects): Measures how much the individual means deviate from the grand mean. This captures how each subject’s mean response differs from the grand mean.


Within-Subjects Variance (Error): This is the remaining variance within subjects that cannot be explained by time, group, or individual difference.

Analysis of variance


\[\begin{array}{|l|c|c|c|c|c|} \hline \textbf{Source of Variation} & \textbf{SS} & \textbf{df} & \textbf{MS} & \textbf{F} & \textbf{p-value} \\ \hline Group & SS_{Group} & df_{Group} & MS_{Group} & F_{Group} & p_{Group} \\ Subjects & SS_{Subjects} & df_{Subjects} & MS_{Subjects} & & \\ Time & SS_{time} & df_{time} & MS_{time} & F_{time} & p_{time} \\ Error & SS_{error} & df_{error} & MS_{error} & & \\ \hline Total & SS_{total} & df_{total} & & & \\ \hline \end{array}\]

Conditions: Sphericity


Mauchly’s Test for Sphericity


  • \(H_0: \text{Homogeneity of variances}\)

  • \(H_1: \text{No homogeneity of variances}\)

Sphericity is the condition where the variances of the differences between all combinations of related groups (levels) are equal.

The violation of sphericity is serious, causing an increase in the Type I error rate.

Hands-on REPEATED MEASURES ANOVA

WeightLoss: data

Data on weight loss and self esteem over three months, for three groups of individuals: Control, Diet and Diet + Exercise.

library(tidyr)
library(carData)
data("WeightLoss")

WeightLoss <- WeightLoss |>
    filter(group!="DietEx") |>  # Keep Control and Diet groups 
    mutate(group =factor(group),  # Group as factor  
    subject = factor(row_number())) |> # Create identifier as factor
    dplyr::select(-c(se1,se2,se3)) # Discard self esteem

WeightLoss_long <- WeightLoss |>
  pivot_longer(
    cols = starts_with("wl"),  # Select weight columns
    names_to = "Time",             # New column for time points
    values_to = "Weight"           # New column for weight values
  ) |>
  mutate(Time = factor(gsub("wl", "", Time)))  # Time as factor

WeightLoss: data

WeightLoss_long |> 
  dplyr::select(-subject) |>
  tbl_strata(
    strata = group,  .tbl_fun =  ~ .x |>
        tbl_summary(by = Time, 
        missing = "no",
        type = Weight ~ "continuous2", 
        statistic=all_continuous()~c("{mean} ({sd})",
        "{median} ({p25}, {p75})"),
        digits = all_continuous() ~ 2,
        label = list(Weight ~ "Diff in Kg")) )
Characteristic
Control
Diet
1
N = 12
2
N = 12
3
N = 12
1
N = 12
2
N = 12
3
N = 12
Diff in Kg





    Mean (SD) 4.50 (1.00) 3.33 (1.07) 2.08 (1.16) 5.33 (1.37) 3.92 (1.38) 2.25 (1.14)
    Median (Q1, Q3) 4.50 (4.00, 5.00) 3.00 (2.50, 4.00) 2.00 (1.00, 3.00) 5.50 (4.00, 6.50) 4.00 (3.00, 5.00) 2.00 (1.00, 3.00)

WeightLoss: data

# Plot
ggplot(WeightLoss_long, aes(x = Time, y = Weight , color = group)) +
  geom_point(size = 4, alpha = 0.3) +
  geom_boxplot(aes(x = Time, group = interaction(Time, group)), width = 0.4, alpha = 0.5) +
  facet_wrap(~group) + 
  labs(x = "Time", y = "Diff in Kg") +
  theme_minimal() 

WeightLoss: normality

#QQplot
ggplot(WeightLoss_long, aes(sample = Weight)) +
  geom_qq() +
  geom_qq_line() +
  facet_wrap(~Time*group) +  
  labs(title = "Q-Q Plots for Each Group",
       x = "Theoretical Quantiles",
       y = "Sample Quantiles") +
  theme_minimal()

WeightLoss: anova

# ANOVA: dependent ~ independent
model_anova_rm <- aov(Weight ~ Time + group + Error(subject/Time), data = WeightLoss_long)  

summary(model_anova_rm)

Error: subject
          Df Sum Sq Mean Sq F value Pr(>F)
group      1   5.01   5.014   1.478  0.237
Residuals 22  74.64   3.393               

Error: subject:Time
          Df Sum Sq Mean Sq F value Pr(>F)    
Time       2  90.86   45.43   98.86 <2e-16 ***
Residuals 46  21.14    0.46                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

WeightLoss: anova

# ANOVA: dependent ~ independent
model_anova_rm <- aov(Weight ~ Time + group + Error(subject/Time), data = WeightLoss_long)  

# Arrange results
results_anova_rm <- broom::tidy(model_anova_rm)
results_anova_rm$term<-c("Group","Residuals","Time","Residual")
names(results_anova_rm)<-c("Stratum","Term","df","SS","MSQ","F","P-value")


# Tabulate results
results_anova_rm |>
  gt::gt() |>
  gt::fmt_number(decimals = 3) |>
  gt::tab_header(title = "Weigth: repeated measure Anova") 
Weigth: repeated measure Anova
Stratum Term df SS MSQ F P-value
subject Group 1.000 5.014 5.014 1.478 0.237
subject Residuals 22.000 74.639 3.393 NA NA
subject:Time Time 2.000 90.861 45.431 98.861 0.000
subject:Time Residual 46.000 21.139 0.460 NA NA

We can’t reject the null hypothesis that all means are equal across groups.


We reject the null hypothesis that all means are equal across Time.

WeightLoss: homocedasticity

Mauchly’s Test for Sphericity

library(ez)

anova_test <- ezANOVA(  
  data = WeightLoss_long,
  dv = Weight,      # Dependent variable
  wid = subject,    # Repeated measure (subject)
  within = Time,    # Within-subject factor
  between = group,  # Between-subject factor
  detailed = TRUE
)

anova_test[2:3]
$`Mauchly's Test for Sphericity`
      Effect         W         p p<.05
3       Time 0.9061593 0.3553429      
4 group:Time 0.9061593 0.3553429      

$`Sphericity Corrections`
      Effect       GGe        p[GG] p[GG]<.05       HFe        p[HF] p[HF]<.05
3       Time 0.9142099 6.779951e-16         * 0.9928239 4.540108e-17         *
4 group:Time 0.9142099 2.327443e-01           0.9928239 2.313940e-01          

GG: Greenhouse-Geisser epsilon; HFe: Huynh-Feldt epsilon.

WeightLoss: marginal means

emm_results <-emmeans(model_anova_rm, ~ Time | group)

pairs(emm_results, adjust = "tukey")
group = Control:
 contrast      estimate    SE df t.ratio p.value
 Time1 - Time2     1.29 0.196 46   6.601  <.0001
 Time1 - Time3     2.75 0.196 46  14.053  <.0001
 Time2 - Time3     1.46 0.196 46   7.452  <.0001

group = Diet:
 contrast      estimate    SE df t.ratio p.value
 Time1 - Time2     1.29 0.196 46   6.601  <.0001
 Time1 - Time3     2.75 0.196 46  14.053  <.0001
 Time2 - Time3     1.46 0.196 46   7.452  <.0001

P value adjustment: tukey method for comparing a family of 3 estimates 

WeightLoss: marginal means

library(afex) # Properly accounts for repeated measures

model_afex <- aov_car(Weight ~ Time + group + Error(subject/Time), data = WeightLoss_long)

emm_results <-emmeans(model_afex, ~ Time | group)

pairs(emm_results, adjust = "tukey")
group = Control:
 contrast estimate    SE df t.ratio p.value
 X1 - X2      1.17 0.251 22   4.655  0.0003
 X1 - X3      2.42 0.313 22   7.726  <.0001
 X2 - X3      1.25 0.253 22   4.938  0.0002

group = Diet:
 contrast estimate    SE df t.ratio p.value
 X1 - X2      1.42 0.251 22   5.652  <.0001
 X1 - X3      3.08 0.313 22   9.857  <.0001
 X2 - X3      1.67 0.253 22   6.584  <.0001

P value adjustment: tukey method for comparing a family of 3 estimates 

WeightLoss: marginal means

library(afex) 

model_afex <- aov_car(Weight ~ Time + group + Error(subject/Time), data = WeightLoss_long)

emm_results <-emmeans(model_afex, ~ group | Time)

pairs(emm_results, adjust = "tukey")
Time = X1:
 contrast       estimate    SE df t.ratio p.value
 Control - Diet   -0.833 0.490 22  -1.701  0.1030

Time = X2:
 contrast       estimate    SE df t.ratio p.value
 Control - Diet   -0.583 0.504 22  -1.156  0.2599

Time = X3:
 contrast       estimate    SE df t.ratio p.value
 Control - Diet   -0.167 0.470 22  -0.355  0.7263

WeightLoss: marginal means

# Plot estimated marginal means with confidence intervals

as.data.frame(emm_results) |> 
ggplot(aes(x = Time, y = emmean, color = group, group = group)) +
  geom_point(size = 3) +
  geom_line() +
  geom_errorbar(aes(ymin = emmean - SE, ymax = emmean + SE), width = 0.2) +
  labs(x = "Time", y = "Estimated Mean Weight") +
  theme_minimal() 

WeightLoss: anova + interaction

# ANOVA: dependent ~ independent
model_anova_rm <- aov(Weight ~ Time * group + Error(subject/Time), data = WeightLoss_long)  

# Arrange results
results_anova_rm <- broom::tidy(model_anova_rm)
results_anova_rm$term<-c("Group","Residuals","Time","Time:Group","Residual")
names(results_anova_rm)<-c("Stratum","Term","df","SS","MSQ","F","P-value")


# Tabulate results
results_anova_rm |>
  gt::gt() |>
  gt::fmt_number(decimals = 3) |>
  gt::tab_header(title = "Weigth: repeated measure Anova") 
Weigth: repeated measure Anova
Stratum Term df SS MSQ F P-value
subject Group 1.000 5.014 5.014 1.478 0.237
subject Residuals 22.000 74.639 3.393 NA NA
subject:Time Time 2.000 90.861 45.431 101.070 0.000
subject:Time Time:Group 2.000 1.361 0.681 1.514 0.231
subject:Time Residual 44.000 19.778 0.449 NA NA

We can’t reject the null hypothesis that all means are equal across groups.
We reject the null hypothesis that all means are equal across Time.
We can’t reject the null hypothesis about NO interaction between Time and Group.

ANOVA IS A LINEAR MODEL


One-way ANOVA: \[y \sim x_1 + \text{error} \]

Two-way ANOVA: \[y \sim x_1 + x_2 + x_1:x_2 +\text{error} \]

Two-way ANOVA repeated measures: \[y \sim x_1 + x_2 + \text{(1|Subject)} \]

Final messages

  • ANOVA is a powerful tool for analyzing experimental designs.

  • The assumption of heterogeneity is key.

  • Normality? Not so much…

  • Post hoc analysis: handle with care.

  • Surprise: ANOVA is a linear model!


📅 Thanks & See You Next Week! 👋